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Abstract 

We present triqs/cthyb, a state-of-the art open-source implementation of the continuous¬ 
time hybridisation expansion quantum impurity solver of the TRIQS package. This code 
is mainly designed to be used with the TRIQS library in order to solve the self-consistent 
quantum impurity problem in a multi-orbital dynamical mean field theory approach to 
strongly-correlated electrons, in particular in the context of realistic calculations. It is 
implemented in C-I--I- for efhciency and is provided with a high-level Python interface. 
The code is ships with a new partitioning algorithm that divides the local Hilbert space 
without any user knowledge of the symmetries and quantum numbers of the Hamilto¬ 
nian. Furthermore, we implement higher-order configuration moves and show that such 
moves are necessary to ensure ergodicity of the Monte Carlo in common Hamiltonians 
even without symmetry-breaking. 
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Carlo, C++, Python 

External routines/libraries: TRIQS, cmake. 

Nature of problem: 

Accurate solvers for quantum impurity problems are needed in condensed matter theory. 
Solution method: 

We present an efficient C++/Python open-source implementation of a continuous-time hybridiza¬ 
tion expansion solver. 

1. Introduction 

Strongly-correlated quantum systems are a central challenge for theoretical condensed 
matter physics with a wide range of remarkable phenomena such as metal-insulator 
transitions, high-temperature superconductivity, and magnetism. In the last two decades, 
tremendous progress has been made in the field of algorithms for the quantum many- 
body problem, both in refining existing techniques and in developing new systematic 
approximations and algorithms. Among these methods is dynamical mean-field theory 
(DMFT) [1, 2] and its cluster [3] or diagrammatic extensions [4, 5, 6]. DMFT methods 
can also be combined with more traditional electronic structure methods such as density 
functional theory leading to ab initio realistic computational techniques for strongly- 
correlated materials [2]. 

Quantum impurity models play a central role in DMFT methods. From the computa¬ 
tional point of view, they are normally the bottleneck. The study of modern algorithms 
for these problems, and their implementation is therefore of great importance in the field. 
In the last decade, the continuous-time quantum Monte Carlo methods have emerged and 
become well-established, starting with the work of A. Rubtsov et al. [7], P. Werner et 
al. [8, 9, 10], and E. Gull et al. [11, 12]. In the continuous-time quantum Monte Carlo 
algorithm based on the hybridization expansion (CT-HYB) [8, 10], the Monte Carlo 
performs a systematic expansion in the hybridization of the impurity to the bath. In 
the last years, several improvements have been proposed to optimise such algorithms 
[13, 11, 14, 15], which in particular have greatly improved our capability to solve multi¬ 
orbital models (e.g., 3 or 5 bands) that are crucial in realistic DMFT calculations. 

In this paper, we present an implementation of the CT-HYB algorithm with state- 
of-the-art improvements and optimisations. This code, triqs/cthyb, is part of the TRIQS 
family, and is based on the TRIQS library [16]. It is released under the Free Software 
GPLv3 license. The algorithm can a priori handle any type of quantum impurity models 
and any interaction form. The code implements an optimisation of the atomic trace 
computation using a balanced left-leaning red-black tree, following the pioneering work 
of E. Gull [11], in combination with some controlled truncation of the trace computation 
[15] and ‘quick-abandon’ procedure [17]. In Sec. 5, we will indeed show explicitly that 
Gull’s balanced tree algorithm drastically improves the scaling of the code for a five 
band model, in which the computation time increases only very mildly when temperature 
decreases. 

In addition, we present and implement a new algorithm to divide the local Hilbert 
space of the quantum impurity to accelerate matrix products of the trace computation, 
without the need to specify its symmetry or a set of quantum number operators. This 
is particularly convenient in complex, multi-orbital contexts, where the set of quantum 
numbers may not be obvious. For example, in [18] an additional set of quantum numbers 
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was discovered for the Kanamori Hamiltonian, leading to a significant efficiency gain. 
With the present algorithm, these quantum numbers are no longer needed as user input. 
In many cases, this algorithm will lead to a more thorough Hilbert space decomposition 
than using only the well-known quantum numbers. 

Let us note that this code is a replacement for the previous implementation of the 
CTHYB package. Indeed all the implemented improvements give strictly the same Markov 
chain as our previous implementation, so it is the same Monte Carlo algorithm, but is 
much faster in practice. We discuss the speed improvement in more detail in Sec. 5. 

For systems with broken symmetry, it was shown that higher-order configuration 
moves were crucial to ensure ergodicity of the algorithm and hence give correct results. 
We implement such moves in the quantum Monte Carlo and find that such moves are 
required to ensure ergodicity even in the absence of symmetry-breaking [19]. An example 
of such a case is given in Sec. 6. 

This paper is organized as follows. In Sec. 2, we first show in a simple example how to 
use the solver from its high-level Python interface. In Sec. 3, we give a general overview of 
the CT-HYB formalism. We discuss partitioning schemes in Sec. 4. We briefly describe 
the tree-based optimisation of the dynamical trace computation in Sec. 5. In Sec. 6, we 
discuss ergodicity of the algorithm and the need for higher-order configuration updates. 
We give detail on how to obtain and install the package in Sec. 7, and finally conclude 
in Sec. 8. 


2. Usage 

The CTHYB application is built on the TRIQS library [16], which must be pre-installed. 
The interface is very simple. The user initialises the Solver object, sets the input quan¬ 
tities, and finally calls the solver. The resulting output quantities can thereafter be 
analysed and manipulated by means of the TRIQS library.^ 

Along the lines of the TRIQS library, the core of the cthyb solver presented here is 
written in C-I--I- for efficiency, with a Python wrapper for ease of use. Hence the solver can 
be used directly from C-I--I-, in a Python script or interactively in an IPython notebook. 

Below we show an example of a Python script implementing a DMFT loop for a 
five-band system on the Bethe lattice with fully rotationally-invariant interactions. 


Listing 1 Example script to run DMFT on a five-band model with fully rotationally-invariant 
Slater interactions 

from pytriqs.gf.local import * 

from pytriqs.archive import HDFArchive 

from pytriqs.applications.impurity_solvers.cthyb import Solver 
import pytriqs.operators.util as op 
import pytriqs.utility.mpi as mpi 

# General parameters 

filename = ’slater_5_band.h5’ # 

beta =100.0 # 

1 = 2 # 

n_orbs = 2*1 +1 # 

U = 5.0 # 


Name of file to save data to 
Inverse temperature 
Angular momentum 
Number of orbitals 
Screened Coulomb interaction 


^The interface described above is similar to that of of the simpler CT-INT algorithm presented as an 
example in Ref. 16. 
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J = 0.1 

half_bandwldth = 1.0 

mu = (U/2.0)*((2.0*n_orbs) -1.0) \ 

- (5.0*J/2.0)*(n_orbs-1.0) 
spin_names = [ ’up' , ’down ’ ] 

orb_names = for i in range (n_orbs) ] 

off_diag = True 
n_loop = 5 


# Hund’s coupling 

# Half bandwidth 

# Half-filling chemical potential 

# Outer (non-hybridizing) blocks 

# Orbital indices 

# Include orbital off-diagonal elements 

# Number of DMFT self-consistency loops 


# Solver parameters 

p = o 

p[" n_warmup_cycles" ] = 10000 

p[" n_cycles" ] = 10000 

p[" length_cycle "] = 100 
p [ " inove_double " ] = True 


# Number of warmup cycles to equilibrate 

# Number of measuresments 

# Number of MC steps between consecutive measures 

# Use four“operator moves 


# Block structure of Green’s functions 

# gf_struct = {’up’:[0,l,2,3,4], ’down’:[0,l,2,3,4]} 

# This can be computed using the TRIQS function as follows: 

gf_struct = op.set_operator_structure(spin_names,orb_names,off_diag=off_diag) 


# Construct the 4-index U matrix U_-Cijkl} 

# The spherically - symmetric U matrix is parametrised by the radial integrals 

# F_0 , F_2 , F_4 , which are related to U and J. We use the functions provided 

# in the TRIQS library to construct this easily: 

U_mat = op.U_matrix(1=1, U_int=U, J_hund=J, basis= ’spherical’ ) 


# Set the interacting part of the local Hamiltonian 

# Here we use the full rotationally-invariant interaction parametrised 

# by the 4-index tensor U_{ijkl}. 

# The TRIQS library provides a function to build this Hamiltonian from the U tensor: 
H = op.h_int_slater(spin_names,orb_names,U_mat,off_diag=off_diag) 


# Construct the solver 

S = Solver(beta=beta, gf_struct=gf_struct) 


# Set the hybridization function and G0_iw for the Bethe lattice 
delta_iw = GfImFreq(indices=orb_names, beta=beta) 

delta_iw << (half_bandwidth/2.0)**2 * SemiCircular(half_bandwidth) 
for name, gO in S.G0_iw: gO << inverse(i0mega_n + mu - delta_iw) 


# Now start the DMFT loops 
for i_loop in range(n_loop): 

# Determine the new Weiss field G0_iw self-consistently 
if i_loop > 0: 

g_iw = GfImFreq(indices = orb_names , beta = beta) 

# Impose paramagnetism 

for name, g in S.G_tau: g_iw << g_iw + (0.5/n_orbs)♦Fourier(g) 

# Compute S.G0_iw with the self-consistency condition 
for name, gO in S.G0_iw: 

gO << inverse(iQmega_n + mu - (half_bandwidth/2.0)**2 * g_iw ) 

# Solve the impurity problem for the given interacting Hamiltonian and set of parameters 

S.solve(h_int=H, ♦♦p) 

# Save quantities of interest on the master node to an h5 archive 
if mpi.is_master_node(): 

with HDFArchive(filename, ’a’ ) as Results: 

Re suit s [’ G0_iw-“/o s ’ “/o i-loop] = S.G0_iw 
Re suits [’ G_taus i-loop] = S.G_tau 


Input parameters 

The basic usage of the solver involves three steps: first, the construction of a Solver 
object, second, the setting of the input Weiss held G 0 _iw, and hnally, the invoking of the 
solve() method. We describe below the necessary parameters for each of these steps: 
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• s = SoiverCbeta, gf_struct) (line 47 in Listing 1) : Here we construct a Solver instance 
called s with the following parameters: 

— beta: The inverse temperature j3. 

— gf.struct: The structure of the block Green’s function (siockCf class, see 
TRIQS library documentation) given as a Python dictionary {block:inner}, 
where block is a string and inner is a Python list of indices within block. 

This ensures that all quantities within Solver are correctly initialised, and in par¬ 
ticular that the block structure of all Green’s function objects is consistent. 

• s.GO.iw «... (line 52 in Listing 1): Here we assign the initial Weiss field Go as 
given by Eq. 3. 

• s.soive(**parains) (line 67 in Listing 1): Here we solve the impurity problem for the 
set of parameters params given as a Python dictionary. Amongst these parameters 
are: 

— h_int: This is a TRIQS Operator object (see Sec. 8.7 in Ref. 16) consisting of the 
interaction terms in the local Hamiltonian (i.e., those containing more than 
two operators). In Listing 1, we use the full rotationally-invariant Hamiltonian 



See the TRIQS library documentation for the pytriqs.operators.utii.u.matrix 
and pytrlqs. operators .util .hamlltonlans modules. 

— n.cycies: This is the number of quantum Monte Garlo measurements made. 
In the case of an MPI calculation, the total number of measurements will be 
n.cycies for each core. 

We refer the reader to the online documentation for a full and regularly updated 
listing of all other Solver initialisation and solve () parameters. 

2.2. Output data 

After the solve () method has completed, several output quantities are available for 
analysis. These include: 

• s.G.tau: The local impurity Green’s function in imaginary time G(t). 

• s.G.iw: The local impurity Green’s function in imaginary (Matsubara) frequencies 
G{iujn). This is computed after the soiveO as the Fourier transform of G(r) if the 
parameter perform.post.proc is True. The high-frequency expansion coefficients are 
fitted according to user-provided parameters. 

• s.sigma.iw: The local impurity self-energy in imaginary (Matsubara) frequencies 

This is computed from s.GO.iw and s.G.iw by solving Dyson’s equation if 
the parameter perform_post_proc is True. The high-frequency expansion coefficients 
are fitted according to user-provided parameters. 
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• s.G_i: The local impurity Green’s function accumulated in Legendre polynomials 
as described in Ref. 20. In order to measure s.G_i, the parameter measure_g_i must 
be set to True in the solve () parameters. 


3. Hybridization expansion of partition and Green’s function 


Here we outline the principles of the continuous-time hybridization expansion formal¬ 
ism [8, 9, 13, 10] briefly. We refer the interested reader to the aforementioned references 
for more detailed derivations. 

The partition function of the impurity model is given by 


Z = 


J 'Dc^'Dcexp{—S). 


( 1 ) 


The action S takes the form 

S = -ff dTdT'^cj,(r)Go^^(T,T')c^(T') + / drUint, (2) 

JJo Jo 

where 


— (i^n ^a/3 (3) 

and Aaisiiujn) is defined such that it vanishes at high frequencies. Certain symmetries 
of the action allow the Green’s function to be reduced into a block diagonal form with 
blocks labelled by a ‘block’ index. The most common such decomposition is by spins, 
denoted by a. The orbital can then be a further, ‘inner’, index a within the spin block. 
The index a then refers to the pair {a, a). 

Let us define T as time ordering operator, the local Hamiltonian 

H\oc = Hint + 'y ' ( 4 ) 

and M such that [M~^]ij = The partition function is expanded in powers 

of the hybridization function as 




k>0 i—1 

with the quantum Monte Carlo Markov chain weights given by 

w{k, {oj, a', Tj, rj}) = ^ Jet^^ Tr (Ti)c„- (r')^ . 


( 5 ) 

( 6 ) 


The partition function Z and the average of any function / over the space of sampled 
configurations C = (fc, {aj, a', rj}) are then given by 


z = Y^ic), 

c 

(/(C)) = |5]^C)/(C). 
c 
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( 7 ) 


( 8 ) 


Specifically, the imaginary-time Green’s function can be measured as 

Gafsir) = [sgn(w(C)) Y M^jS{Ti - rj -h T)(5„,a(5a'/3] ■ (9) 

^ C ij 

Alternatively, the Green’s function can be accumulated in Legendre polynomials fol¬ 
lowing the prescription given in Ref. 20. 

In CT-HYB algorithms, the computational bottleneck for the multi-orbital cases gen¬ 
erally comes from the trace calculation for k operators in Eq. 6, which needs to be recom¬ 
puted for each new configuration. On the other hand, the determinants can be efficiently 
computed even for moderately large k. We reduce this computational cost in two main 
ways. 

As shown in Ref. 13, the trace computation can be optimised by decomposing the lo¬ 
cal Hilbert space into smaller subspaces such that there is a one-to-one mapping between 
subspaces under the application of a creation or annihilation operator. As a result, the 
matrices whose products enter the trace can be of significantly lower dimensionality. Usu¬ 
ally, this decomposition of the Hilbert space relies on user input of the local symmetries 
through quantum numbers of the Hamiltonian. We have devised a novel algorithm that 
accomplishes this partitioning automatically, without any a priori knowledge of quantum 
numbers of the system, giving a more efficient decomposition. This algorithm is detailed 
in Sec. 4. 

The second amelioration that reduces the computational burden of trace calculation 
of each configuration is, as first suggested by Gull [11], the use of a tree structure to cache 
parts of the trace calculation that are unchanged between configurations. The bounding 
properties of the trace [17, 15] help further reduce the time spent in the trace calculation. 
These improvements are the subject of Sec. 5. 

We also note that this algorithm allows for the treatment of all forms of interac¬ 
tion, including using a rotationally-invariant impurity Hamiltonian parametrised by the 
4-index interaction matrix Uap^s- However, for more complex interactions, the sign prob¬ 
lem can become a hindrance. If the interactions are of purely density-density form, the 
GT-HYB segment picture [8] can be used at greatly reduced computational cost. 

4. Partitioning the local Hilbert space 

The calculation of the dynamical trace boils down to multiplication of matrices. It is 
then crucial to find a way to minimise the sizes of the matrices being multiplied. This 
goal can be achieved by partitioning the local Hilbert space H of dimension N into a 
direct sum of K subspaces Hk each of dimension 0 < Nk < N. 

We wish to find a permutation of the basis vectors such that (1) the local Hamiltonian 
is block-diagonal, and (2) all and operators are block matrices with at most one 
non-zero block in each row and column. Such a permutation would group the basis states 
belonging to the same subspace Hk together. 

We implement two strategies to do the partitioning: with user-supplied quantum 
numbers and automatically. The latter strategy is more universal and chosen by default. 
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4-.1. Quantum numbers 

This is the traditional approach to partitioning of the Hilbert space [13]. The user 
provides a list of integrals of motion (operators) such that the Qt can be 

expressed as a function of the density operators Ua ■ Expectation values of these operators 
are calculated for each Fock state which gives a combination of quantum numbers 
associated with the state: 

{'ip\Qi\ip},...,{'ip\QL\'ip) => qi,...,qL- 

All states sharing the same set of quantum numbers belong to the same subspace. 

Condition (1) is obviously fulfilled by the obtained subspaces, as by definition, the 
Hamiltonian cannot connect states with different quantum numbers. As each operator 
Qi corresponds to a function of the operators Ua , condition (2) is also fulfilled. 

In many cases this approach works well, but it requires the quantum numbers and 
hence some prior analysis of the local Hamiltonian. It may be difficult to discover an 
exhaustive set of integrals of motion if the dimension of the local Hilbert space is large 
and the interaction form is complex [18]. 

4--2. Automatic partitioning 

The automatic partitioning algorithm employs no additional a priori information 
about the symmetry of Hamiltonian H. The only input data are the full set of basis Fock 
states and the Hamiltonian itself. The algorithm consists of two sequential phases. In the 
first phase, the finest possible partition, which satisfies condition (1) alone is constructed. 
In the second phase, this partition is modified to additionally satisfy condition (2). 

Phase 1 [Fig. 1] 

To start, one creates a data structure, which stores information about the way N basis 
states are partitioned into a number of subsets. Initially each basis state resides alone 
in its own subset. In the main loop of the algorithm, the Hamiltonian is sequentially 
applied to each basis state (initial state). The application gives a linear combination 
of the basis states with only a few non-zero coefficients, since H is usually sparse. The 
algorithm iterates in an inner loop over all basis vectors with non-zero coefficients (final 
states). If the initial and final state reside in different subsets, these subsets are merged 
together. Once the main loop is over, the partition of the basis is done. Two basis vectors 
are guaranteed to be in different subsets if they cannot be reached from each other by 
application of H any number of times. All matrix elements of H are calculated along 
the way and are stored for later use. 
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Yes 



Figure 1: (Color online) Phase 1 of the automatic partitioning algorithm. A partition of 
the local Hilbert space satisfies condition (1), as described in the text, at the end of this 
phase. 

Phase 2 [Fig. 2] 

During this phase some subsets are additionally merged to satisfy condition (2). This 
part of the algorithm is executed in turn for all cj),, Cq, pairs, and for each pair it proceeds 
as follows. 

Two lists of subspace-to-subspace connections are first generated: M = {Ha —>■ Hb} 
for cj,, and M_ = {Ha —>■ "Hb } for Ca. These lists are created by direct applications of the 
corresponding operators to all Fock states to determine, in which subspaces the resulting 
wave function has nonzero amplitudes. 

Then, the algorithm recursively iterates over all connections in M and M, and merges 
some subspaces following a special ‘zigzag’ visiting pattern. Let us consider a tree-like 
structure with subspaces being the nodes of the tree, and connections being the edges. 
We are interested in a tree generated by a sequence of operators cj,, CacJ,,, cJ,,Cac|,,... 
from a randomly chosen root subspace Hr. The recursive procedure starts at the root 
and traverses the tree in the depth-first order. Simultaneously it perform two actions: 

• Removes visited connections from M (edges from an odd level nodes), or from M 
(edges from an even level nodes). 

• Merges all odd level subspaces with the root subspace and all even level subspaces 
with the second level subspace. 
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If the tree has been fully traversed, but M is still not empty, another connection is 
picked from M. This connection serves as a root—)■ first level branch of the next tree to 
be traversed. 

The proposed ‘zigzag’ traversal procedure squeezes every tree to a pair of nodes 
connected by one edge. As every connection is visited only once, this is a computationally 
economic way to ensure fulhllment of condition (2). 
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Figure 2: (Color online) Phase 2 of the automatic partitioning algorithm. A partition of 
the local Hilbert space satisfies condition (2), as described in the text, after this phase 
has been applied to all operator pairs cl^,Ca- 

An important remark should be made about the data structure which maintains the 
partition oiH. We use a disjoint set data structure designed especially for quick find_set, 
union_set, and make_set operations [21, 22]. Thanks to two special techniques called 
‘union by rank’ and ‘path compression’, the amortized time per operation is only 0{a{n)). 
Here n is the total number of find_set, union_set, and make.set operations, and a{n) 
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is the extremely slowly-growing inverse Ackermann function, with a{n) < 5 for any 
practical value of n. 

A direct comparison of the two partitioning schemes shows that they discover equal 
numbers of subspaces in the cases of the 2-,3-,4-,5-,6- and 7-orbital Hubbard-Kanamori 
Hamiltonians (provided the quantum numbers of Ref. [18] are used in addition to N and 
Sz)- For the rotationally-invariant Slater Hamiltonians, however, the automatic partition 
algorithm is advantageous, as shown in Table 1. 


Model 

#subspaces, QN 

^subspaces. Automatic 

5 orbitals (cJ^Cq in spherical basis) 

36 

276 

7 orbitals (c)],Cq in spherical basis) 

64 

960 

5 orbitals (cj^Ca in cubic basis) 

36 

132 

7 orbitals (cj^Ca in cubic basis) 

64 

244 


Table 1: Comparison of the number of subspaces resulting from the use of quantum 
numbers (QN) and the partitioning algorithm (Automatic) presented in this paper for 
rotationally-invariant Slater Hamiltonians. 


5. Efficient trace calculation using a balanced tree and truncation 

Here we discuss the optimisation of the calculation of the atomic trace using a tree 
structure. We emphasize that the use of such a tree does not change the Markov chain, 
but is simply a method to reduce the computational cost of the trace computation. The 
principal idea behind the CT-HYB algorithm presented here is to use a balanced red- 
black tree to describe the configurations in the Markov chain. The use of a tree to store a 
configuration and partial products necessary in the trace evaluation reduces the number 
of matrix products that need to be performed when inserting or removing a collection 
of one or more operator pairs from the configuration. Hence, the cost of this limiting 
step of the algorithm is significantly reduced as compared to the naive ‘linear’ algorithm 
where the configuration is a linear chain of operators and all the matrix products are 
explicitly evaluated for each configuration. The gain in time per iteration between the 
tree and linear methods of computing the trace for a five-band system with three electrons 
and an rotationally-invariant interaction is shown in Fig. 3. Such a representation was 
first proposed by Gull [11]. Similar implementations based on skip lists have already 
appeared [15, 23] and are shown to be successful despite not mathematically guaranteeing 
logarithmic scaling. 
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Figure 3: (Color online) Scaling of computational time with the inverse temperature ^ 
for a system with three electrons in five bands in a Bethe lattice with a fully rotationally- 
invariant Hamiltonian for two algorithms: a) the linear case in which the trace is recom¬ 
puted from the full linear chain of operator matrices and b) the tree algorithm proposed 
here, in which the number of matrix products necessary is reduced. The overall scaling 
depends both on the scaling of the tree (O(log 2 /?)) and that of the determinants (0(/3^)). 
The same number of Monte Carlo steps is used in all calculations. 

Further efficiency gains are also made by using bound properties of the trace to quickly 
reject proposed moves [17]. Additionally, by summing over the blocks that contribute 
to the trace in order of increasing importance (i.e., increasing energies), one is able 
to truncate the trace evaluation once further contributions are smaller than machine 
precision [15]. 

Each configuration in the Markov chain is described by a tree with operators as nodes 
consisting of a key-value pair. The key, based on which the tree is sorted, is given by 
the imaginary time r of the operator. The value consists of the operator and the matrix 
product of the subtree (using a block structure). The tree for a configration is depicted 


in Fig. 4. 
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Figure 4: (Color online) The representation of a configuration by a tree. Each node 
represents an operator at the time that determines the node’s key. The nodes are sorted 
according to decreasing time to reflect the time-ordering of the trace. When the removal 
of the pair of operators at t = 33.88 and r = 6.65 (highlighted in green) is proposed, 
only the contents (specifically the partial product of matrices) of the nodes between 
these nodes up to the root node at t = 26.42 (highlighted in red) need to be updated. 
Specifically, the matrix stored at each node contains the product of the partial product 
on each of its subtree nodes, the operator of the node itself and the respective time 
evolution operators as shown in the hgure. 

We implement the left-leaning red-black tree (LLRBT), as explained in Refs. 24, 25. 
Our C-I--I- implementation of the LLRBT is adapted from the Java code given in Ref. 26. 

In particular, the basic implementation of a LLRBT was adapted to minimise the 
rebalancing of the tree for a Metropolis Monte Carlo algorithm in which attempted 
moves are not always accepted. At each proposed Monte Carlo step, we add and/or 
remove nodes in the tree and determine the corresponding trace without rebalancing the 
tree. If the proposed move is accepted, the tree is balanced. 

As seen in Fig. 4, the evaluation of a single insertion/removal requires at most log 2 (Ar) 
matrix products (i.e., the height of the tree from the root node to the deepest nodes) 
where K is the order of the configuration. The cost of the trace computation scales 
logarithmically with the perturbation order of the configuration rather than linearly in 
the linear algorithm. 

The average perturbation order of the partition function increases approximately 
linearly with the inverse temperature (3 as shown in Ref. 13 and in Fig. 5. At low /3, 
the cost of computing the trace dominates over that of the determinant. For sufficiently 
large /3, we expect that the computation of the determinants, which scales as AT^, to 
dominate in each Monte Carlo step. 
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Figure 5: (Color online) The perturbation order K increases approximately linearly with 
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Hence, the tree allows us to reach much lower temperatures than was possible with 
the naive linear algorithm for a fixed number of MC steps. A rough estimate of the 
scaling of the cost of five band calculations using both approximate Kanamori and fully 
rotationally-invariant Slater parametrisation of the interactions with /3 is shown in Fig. 6. 
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Figure 6: (Color online) The scaling of the cost of solving the impurity problem for 
Kanamori and fully rotationally-invariant Slater Hamiltonians with the inverse temper¬ 
ature (3. The same number of Monte Carlo steps is used in all calculations. 


6. Four-operator moves and ergodicity considerations 

Introduction of more complex moves beyond the commonly used insertion/removal 
of a single pair of operators were shown to be important in symmetry-broken cases [19]. 
We have implemented Monte Carlo moves in which four operators are simulataneously 
inserted/removed, and moreover, we show that such moves are crucial for proper sampling 
of the configuration space even in cases without symmetry-breaking. One example is a 
two-band Kanamori model with off-diagonal components in the hybridization function. 
In Fig. 7, we show the results of the calculation without double moves, with double moves 
and using exact diagonalisation [27]. When double moves are not used, the results are 
clearly incorrect. The use of double moves is necessary to arrive at the correct result. 
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Figure 7: (Color online) The off-diagonal component of G'(r) for a two-band Kanamori 
Hamiltonian computed using cthyb without double moves (red) differs from the exact 
diagonalisation [27] result (blue). Inclusion of double moves (green) corrects this error. 
The higher level of noise in the result with double moves reflects the low acceptance rate 
of this move. 


The ergodicity problem for such a multi-orbital Kanamori Hamiltonian in the ab¬ 
sence of four-operator moves can be understood. Let us assume that the hybridization 
function is spin-diagonal but has orbital off-diagonal components. Configurations such 
as c|^C 2 ^Ci^C 2 ^ yield a non-zero contribution to the trace for the two-particle sector of 
eigenstates, and more precisely, for the Hand’s singlet and triplet. However, they can 
never be reached when making only pair insertions. Insertion of the c\^, ci^ and 
C 2 ^ pairs is never proposed as the hybridization function is spin-diagonal. On the other 
hand, C 2 ^ and cii lead to a structural cancellation for all eigenstates due to 
conservation of the total number of operators on each orbital modulo 2. We have an ob¬ 
vious ergodicity problem and the Monte Carlo chain cannot reach such a configuration. 
On the other hand, it is clear that this configuration can be reached by a four-operator 
insertion. 

Here we discuss a concrete example of a configuration for which double moves are 
necessary in the Kanamori case. The existence of ergodicity problems for generic Hamil¬ 
tonians in such algorithms is still an open issue. For example, it has not yet been 
proven that four-operator insertions completely avoid ergodicity problems in a five-band 
rotationally-invariant Slater Hamiltonian. This is an important question that requires 
further investigation. 
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7 . Getting started 


7.1. Obtaining cthyb 

The CTHYB source code is available publicly and can be obtained by cloning the reposi¬ 
tory on the GitHub website at https://github.com/TRIQS/cthyb. As the TRIQS project 
is continuously improving, we recommend that users always obtain TRIQS and its ap¬ 
plications, including cthyb, from GitHub. Fixes to possible issues are also applied to the 
GitHub source. 

7.2. Installation 

Installing cthyb follows the same procedure as needed to install TRIQS. Here too, we 
use the cmake tool to conhgure, build and test the application. Assuming that TRIQS 
has been installed at /path/to/TRIQS/install/dir (refer to the online documentation), 
CTHYB is simply installed by issuing the following commands at the shell prompt: 

$ git clone https://github.com/TRIQS/cthyb.git src 
$ mkdir build_cthyb && cd build_cthyb 
$ cmake -DTRIQS_PATH=/path/to/TRIQS/install/dir ../src 
$ make 
$ make test 
$ make install 

This will install cthyb in the same location as TRIQS. Further installation instructions 
can be found in the online documentation. 

7.3. Citation policy 

We kindly request that the present paper be cited in any published work using the 
CTHYB solver. Furthermore, we ask that the TRIQS library on which the solver presented 
here is based also be cited [16]. This helps the cthyb and TRIQS developers to better keep 
track of projects using the library and provides them guidance for future developments. 

7. /. Contributing 

cthyb, as an application of TRIQS, is an open source project and we encourage feed¬ 
back and contributions from the user community. Issues should be reported exclusively 
via the GitHub website at https://github.com/TRIQS/cthyb/issues. For contributions, 
we recommend to use the pull request system on the GitHub website. Before any major 
contribution, we recommend coordination with the main cthyb developers. 

8. Summary 

We have presented the free software TRiqs/CTHYB, an implementation of the continuous¬ 
time hybridization expansion quantum Monte Garlo impurity solver. In addition to 
implementing the various improvements documented in the literature, we have also pre¬ 
sented and implemented a new algorithm to divide the local Hilbert space, removing 
the need for the user to explicitly provide the symmetry of the impurity Hamiltonian 
or its quantum numbers. We also discussed a case where ‘double moves’ in the quan¬ 
tum Monte Garlo are required in order to obtain correct results even in the absence 
of symmetry-breaking. Further developments (e.g., support for complex Hamiltonians, 
other measures) are planned for a future release. 
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